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ABSTRACT 

In this paper, we present, to our knowledge, the first known 
I/O eflicient solutions for computing the fc- bisimulation 
partition of a massive directed graph, and performing main- 
tenance of such a partition upon updates to the underlying 
graph. Bisimulation is a robust notion of node equivalence 
which is ubiquitous in the theory and application of graph 
data. It defines an intuitive notion of nodes in a graph 
sharing fundamental structural features. We consider in 
particular fc-bisimulation, which is the standard variant of 
bisimulation where the topological features of nodes are only 
considered within a local neighborhood of radius k ^ 0. 

The I/O cost of our partition construction algorithm is 
bounded by 0{k ■ sort{\Et\) + k ■ scan{\Nt\) + sort{\Nt\)), 
while our maintenance algorithms are bounded by 0{k ■ 
sort[\Et\) + k ■ sort{\Nt\)). The space complexity bounds 
are 0(|iVt| + \Et\) and 0(fc • |iVt | + fc ■ l^t |), resp.. Here, \Et\ 
and \Nt \ are the number of disk pages occupied by the input 
graph's edge set and node set, resp., and sort{n) and scan{n) 
are the cost of sorting and scanning, resp., a file occupying n 
pages in external memory. Empirical analysis on a variety of 
massive real-world and synthetic graph datasets shows that 
our algorithms not only perform efficiently, but also scale 
gracefully as graphs grow in size. 

1 Introduction 

Massive graph-structured datasets are becoming increas- 
ingly common in a wide range of applications. Examples 
such as social networks, web graphs, web of data, biological 
networks and so forth have drawn a lot of attention in 
both industry and academic research. In reasoning over 
graphs, a fundamental and ubiquitous notion is that of 
bisimulation, which is a characterization of when two nodes 
in a graph share basic structural properties such as labels 
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and neighborhood connectivity. Indeed, bisimulation arises 
and is widely adopted in a surprisingly large range of 
research fields [23) . In data management, for instance, 
bisimulation partitioning (i.e., grouping together bisimilar 
nodes) is often a basic step in indexing semi-structured 
datasets [18], and also finds fundamental applications in 
RDF [20] and general graph data (e.g., compression [?] |S], 
query processing [16], data analytics JT*). 

Consequently, algorithms for bisimulation partitioning 
have been studied for decades. Indeed, well-known algo- 
rithms such as that of Paige and Tarjan [TS] and more 
recent work (e.g., [6]) have effective theoretical behavior. 
In practice, however, these solutions face two significant 
difficulties. 

First, with the exception of paper [15], all known 
approaches for computing bisimulation are internal-memory 
based solutions. As such, their inherently random memory 
access patterns do not translate to efficient I/O-bound 
solutions, where it is crucial to avoid such access patterns. 

Second, it is often the case that bisimulation partitions 
on real data are still too costly to compute and maintain, 
and, furthermore, the resulting partitions are too refined for 
effective use. Hence, a notion of localized fc-bisimulation 
has proven to be quite successful in data management 
applications (e.g., |10l 1161 1211 125)'). fc-bisimulation is a 
variant of bisimulation where the topological features of 
nodes are only considered within a local neighborhood of 
radius fc ^ 0. With a pay-as-you-go nature, fc-bisimulation 
is cheaper to compute and maintain, cost adjustable, and 
faithfully representative of the bisimulation partition within 
the local neighborhood. 

All known centralized solutions for computing fc-bisimulation 
partitions (again, with the exception of [IS]) are also 
internal-memory based. The reality is that, nowadays, 
many graphs are too large to be processed in main memory. 
Indeed, massive graphs are ubiquitous (e.g., linked data, 
biological networks, social networks, to name a few [71 114)). 
It is quite common for a researcher to encounter a graph with 
billions of nodes and edges. Furthermore, the size of graphs 
will only continue to grow as technologies for generating and 
capturing data continue to improve and proliferate. We can 
safely conclude that it will become increasingly infeasible to 
apply existing bisimulation partition algorithms in practice. 



To process real graph datasets, therefore, we must 
necessarily turn to either external memory, distributed, 
or parallel solutions. While there has been some work 
on parallel (e.g., |22| I24| ') and distributed (e.g., [3]) ap- 
proaches to bisimulation computation, and, recently, ex- 
ternal memory solutions on restricted acyclic and tree- 
structured graphs [TS], there has been to our knowledge 
no work on computing bisimulation and fc-bisimulation 
partitions on arbitrary graph structures in external memory. 

Given these motivations, we have studied external mem- 
ory solutions for reasoning about fc-bisimulation on arbitrary 
graphs. In this paper, we present the results of our study, 
which makes the following high-level contributions. 

• We present the first known I/O efficient external mem- 
ory based algorithm for constructing the fc-bisimulation 
partition of a disk-resident graph. The I/O cost 
of this algorithm is bounded by 0{k ■ sort{\Et\) + 
k ■ scan{\Nt\) + sort{\Nt\)), with space complexity 
0{\Nt\ + \Et\), where \Et\ and \Nt\ are the number 
of disk pages occupied by the input graph's edge set 
and node set, resp., and sort{n) and scan{n) are the 
cost of sorting and scanning, resp., a file occupying n 
pages in external memory. 

• We present the first known I/O efficient external 
memory based algorithms for performing maintenance 
on a disk-resident fc-bisimulation graph partition, with 
I/O cost bounded by 0{k ■ sort{\Et\) + k ■ sort{\Nt\)), 
and space complexity 0{k ■ \Nt \ + k ■ \Et\) ■ 

• We present the results of an extensive empirical 
analysis of our solutions on a variety of massive real- 
world and synthetic graph datasets, showing that our 
algorithms not only perform efficiently, but also scale 
gracefully as graphs grow in size. For example, the 
10-bisimulation partition of a graph having 1.4 billion 
edges can be computed with our solution within a day 
on commodity hardware, while this would take weeks, 
if not months, for a traditional in-memory algorithm 
to accomplish in the same environment. 

We note that parallel and distributed solutions could also 
benefit from the basic novel ideas developed here; we leave 
such explorations open as avenues for future research. 

The rest of the paper is organized as follows. In the next 
section we give our basic definitions and data structures 
used. We then describe in Section |3] our solution for 
constructing localized bisimulation partition. Next, Section 
^presents algorithms for keeping an existing partition up to 
date, in the face of updates to the underlying graph. Section 
[S]presents the results of our empirical study of all algorithms. 
We then conclude in Section [S] with a discussion of future 
directions for research. 

2 Preliminaries 

2.1 Data model and definitions 

Our data model is that of finite directed node- and edge- 
labeled graphs {N,E,\n,\e), where A'' is a finite set of 
nodes, E <Z N x N is & set of edges, Ajv is a function from 
N to a. set of node labels Cn, and A_e is a function from E 
to a set of edge labels Ce- 

Definition 1. Let k he a non-negative integer and G = 
{N , E,\n ,\e) be a graph. Nodes u,v £ N are called k- 
bisimilar (denoted as u ~* v), iff the following holds: 

1. \n(u) = Xn{v), 



2. ifk> 0, then Vu' G N[{u, u) e E ^ 3v' £ N[{v, v') € 
E, u' v' and \e{u,u') — \e{v,v'Y\\, and 

3. ifk > 0, then W e N[{v,v') e E ^ 3u' e N[{u,u') € 



v' 



u and Xe{v,v') = \e{u,u')]]. 



It can be easily shown that the k-bisimtlar relation is an 
equivalence relation. 

We illustrate Definition [T] with an example. Consider the 
graph given in Figure [1] It is a small social network graph, 
in which nodes 1 and 2 are 0- and 1- bisimilar but not 2- 
bisimilar. 




Figure 1: Example graph of a social network, where nodes 1 and 
2 have label M (short for "manager"), and the other nodes have 
label P (short for "people"). The edge label I is short for "likes", 
while w is short for "works for". 



Recall from Section [T] that our interest in this paper is in 
computing the fc-bisimulation partition of a massive graph, 
and performing maintenance on the result under updates to 
the original graph. By massive, we mean that both the set 
of nodes and the set of edges of the graph are too big to fit 
into main memory. By a partition of the graph, we mean an 
assignment of each node u of the graph to a partition block, 
which is the unique subset of nodes in the graph of which 
the members are fe-bisimilar to u. 

In particular, we are interested in constructing partition 
"identifiers." 

Definition 2. A fc-partition identifier for graph G = 
{N,E,\m,\e) and k > is a set of k + 1 functions 
V — {p/doi • • • 1 P^df.} such that, for each < i < k, pld^ is a 
function from N to the integers, and, for all nodes u,v £ N , 
it holds that pld^{u) = pld^{v) iff v. 

A fundamental tool in our reasoning about fc-bisimulation 
is the notion of node signatures. 

Definition 3. Let G = {N, E, Ajv, Ab) be a graph, k >0, 
and V — {pMq, . . . , pld^.} be a k-partition identifier for G. 
The k bisimulation signature of node u G N is the pair 
sigf.{u) = {pldf,{u), L) where: 



L = 



^fk = 0, 

{(XE{u,u'),pId^_,{u')) I {u,u') £E} i/fc > 0. 



We then have the following fact. 

Proposition 1. pldf^{u) = pld,^{v) iff stg^^{u) = sig^^{v) 
(fc > 0). 

A proof of Proposition [1] can be found in Appendix lA.il 

Proposition [T] is the basis of all algorithms in this paper. 
The basic idea is that a node's fc-bisimulation partition 
block can be determined by its fc bisimulation signature, 
which in turn is determined by the (fc — l)-bisimulation 
partition of the graph. Intuitively, in order to compute 

^Note that we use \e{u,u'), instead of \e{{u,u')), for ease 
of readability. 



the fc-bisimulation partition, we compute the graph's j- 
bisimulation (0 < j < k) partitions bottom-up, starting 
from j = 0. We call each such intermediate computation 
the iteration j computation. 

It is straightforward to show that the fc-bisimulation 
partition of a graph is unique. Hence, in the sequel, we can 
safely talk about fc-partition identifiers as unique objects. 
Also, note that we will use integer node identifier values 
(denoted as uld) to designate nodes u £ N. Therefore, in 
the following discussions the functions sigt and pidf, both 
could take node identifiers (i.e., integers) as input. 

Table 1: k-bisimulation for the example graph in Figure [T] (A; = 
0,1,2) 



function from sigj to p/rf . The implementation details of S 



nid 




siQi [nld) 


pld^ [nld) 


sig2{nld) 


pld2{nld) 


1 


1 


l,{(»,l),(i,2)| 


3 


l,{(i«,3),{i,5)} 


7 


2 


1 


!.{(», !),(;, 2)} 


3 


l,{K3),(/,6)} 


8 


3 


2 


2,{(U)} 


4 


2,{(U)} 


9 


4 


2 


2,{(i,2)} 


5 


2,{(;,4)} 


10 


5 


2 


2.{(U)} 


4 


2,{(i,3)} 


9 


6 


2 


2,{} 


6 


2,{} 


11 



will be discussed in Section [3.21 

For ease of discussion and investigation, we assume in 
what follows that the node and edge tables are each just one 
file sequentially filled with fixed length records. Moreover, 
in this paper we make use of sort merge join to the 
extent possible, since it is a very basic way to achieve 
I/O efficient results. However, many possibilities could 
be explored for implementing these data structures (e.g., 
indexing techniques) and join algorithms to further optimize 
our presented results. We leave such investigations open for 
future research. 

Finally, we also assume that we have a (possibly external 
memory based) priority queue available. In our empirical 
study below, we use the off-the-shelf I/O efficient priority 
queue implementation provided by the open source STXXL 
library [5]. 

2.3 Cost model 



Table [T] shows one way of assigning fc-bisimulation (fc — 
0,1,2) partition identifiers and signatures for the example 
graph in Figure [T] where the nld denotes the unique 
identifier for each node, and pld^{nld) and sig^^nld) 
(0 < i < 2 and < j < 2) are presented accordingly. 
For fc = 0, nodes are grouped into two partitions by node 
labels (given identifiers 1 and 2). Then for k = 1,2, 
signatures are constructed according to Definition O and 
then distinct partition identifiers are assigned to distinct 
signatures, following Proposition [T] 

2.2 Data structures 

We assume that graphs are saved on disk in the form of 
fixed column tables (node set as table Nt and edge set as 
table Et). We also assume that these tables can have several 
copies sorted on different columns. In later discussions, we 
will use the notation X.y to refer to column y of table X. 
We have the following possible attributes for Nt: 



nld 


node identifier (note that this is the same 
as row identifier in the table; we leave this 
attribute here for clarity of the discussion) . 


nLabel 


node label 


pIdold_nId 


bisimulation partition identifier for the given 
nld from last computation iteration 


pldjiew_nld 


bisimulation partition identifier for the given 
nld from the current computation iteration 


Pldj_nld 


j bisimulation partition identifier for the 
given nld (j = 0,1, . . . , k) 


and for Et: 


sId 


source node identifier 


tid 


target node identifier 


eLabel 


edge label 


pIdold_tId 


bisimulation partition identifier for the given 
tId from last computation iteration 



We further assume that we have a signature storage 
facility S, which stores the mapping between signatures 
and their corresponding partition identifiers. 5 is a 
data structure having only one idempotent function called 
S.insert(). For node u £ N, S.insert() takes sig^{u) 
(0 < j < k) as input, and provides pld-{u) as output. 
Essentially S.insert() implements the one to one mapping 



Since our focus is on disk-resident datasets, we use standard 
I/O complexity notions to analyze our algorithms 01. The 
primary concern here is to minimize the number of I/Os 
needed to complete the task at hand. 

Suppose we have table X, space to hold B disk pages in 
internal memory, and X occupies \X\ pages on disk. In what 
follows, we will use the following notation: 

• sort{\X\) denotes the number of I/Os when sorting 
table X on some given column(s). This will take 
2|X|(1 + \logB-i\^-^Y\) I/Os for a standard external 
memory based merge sort. 

• scan{\X\) denotes the number of I/Os when scanning 
over table X. This will take \X\ I/Os. 

• search{\X\) denotes the number of I/Os when search- 
ing for some key in some columns of X. This 
operation's cost will vary from a constant number of 
I/Os to \X\ I/Os, depending on how we implement X, 
whether X is sorted on the certain columns, whether 
we have an index on those columns, and whether we 
want to search for the whole result set or the first key 
appearance. 

3 Constructing the localized bisimulation 
partition 

We present our algorithm for fc-bisimulation partition com- 
putation in Algorithm [T] The algorithm is inspired by 
Proposition [1] meaning for each node in the input graph, 
to construct its signature and find a one-to-one mapping 
number (partition identifier) for that signature. 

In iteration j = 0, we assign distinct partition identifiers 
to nodes based on their nLabels. For other iterations j > 0, 
our algorithm mainly performs two things for each node ID 
uld e Tinid{Nt) (line [T3] to [T7J : (1) construct sig^{uld); 
and (2) insert sigj{uld) to S, record the returning pldj{uld) 
in the corresponding row in Nt- To prepare the necessary 
information for constructing sigAuId), we need to fill in the 
missing columns of Et (linelSlto llOp . Several scans and sorts 
on tables are involved for each iteration. Note that some 
operations in the algorithm can be merged as one in practice. 
We present them separately just to make the presentation 
clearer. A detailed description is given in Section [3. II 



Algorithm 1 Compute the k-bisimulation equivalence classes of a graph 



1: procedure BuiLD_BlSlM(A''t, Et, k) 



2; if fc = then 

3: fill in the pIdo_nid and pldnew_nid columns of Nt > 0{sort{\Nt\)) + 0{scan{\Nt\)) 
4: return [Nt, Et) 

5: (A^t, iJt)-s— BuiLD_BlSlM(A''t, iJt, — 1) > fc > 0, recursive call 

6: if fc = 1 then 

7: A^t ^ sort(iVt) by n/d t> 0{sort{\Nt\)) 

8: Et ^ soit{Et) by tid t> 0{sort{\Et\)) 

9: scan A^'t, move content of column pldm-w.nid to pIdoid_nid > 0{scan{\Nt\)) 

10: fill in the p/rfoH_ud column of iSt t> 0{scan(}Et\)) + 0{scan{\Nt\)) 

11: initialize S 

12: F TVci{Et), where a = {sId,eLabel,pIdoid_tid) 

13: F sort(F) by sId,eLabel, pIdoid_tid, removing duplicates o 0{sort{\Et\)) 

14: for each uld £ iTnidiNt) do o overall ©(scandSt |)) + 0(scan(|Ai't |)) + cost of S 

15: construct sigf,{uld) from _F > merge join with F 
16: pldf.{uld) S.insert{sigf.{uld)) 

17: record pldf.{uld) in the pldnew_nid column of Nt where nid = uld 

18: return (A^t, St) 



3.1 Details of Algorithm[T](BuiLD_BisiM()) 

3.1.1 Input and output 

The input variables of Algorithm [l] are node table Nt , edge 
table Et and k, which is the degree of local bisimilarity from 
Definition [1] The output variables are Nt and Et- The 
schema of Nt is {nId, nLabel, pIdo_nid, pIdoid_nid, pldnew_nid)\ 
the schema of Et is (s/d, eLabel, tId, pIdoidMd)- 

3.1.2 k = 0, line^toEl 

According to Definition [1] = means nodes having the 
same labels should be assigned the same partition identifier. 
We achieve this by sorting the Nt on nLabel column. When 
scanning Nt, for each new nLabel we encounter, we assign a 
new integer (e.g., a predefined counter) to the corresponding 
nId, filling it in the pIdo_nid and pld„c^_nid columns. This 
will take 0{sort{\Nt\)) + 0{scan{\Nt\)) I/Os. Note that 
since pIdo_nid can be assigned during the last step of the 
sorting process, the scanning cost 0{scan(\Nt\)) can be 
omitted. Also note that one alternative way of assigning 
pi do is to use a hash map. We can create a hash map using 
nLabel as the keys, and pido as the values. The upper bound 
of this method is the same as the one we present here. 

details of line [3] of Algorithm [1] 

sort Nt by nLabel > 0{sort{\Nt\)) 

create variable current_pld 

for all {nl d, nLabel, pIdo_nid, pIdoid_nid, pldnew_nid) 

e Nt do > 0{scan{\Nt\)) 

if nLabel is new then 

current_pld <~ request a new pid 
save current_pld to pIdo_nid and pldnew_nid 

3.1.3 k>0, line\5\to\T8\ 

For fc > 0, we first perform a recursive call to the algorithm, 
ensuring we work in a bottom-up manner. For iteration 1 
(fc = 1), we sort Nt and Et on nId and tId, preparing them 
for later merge join operations. The algorithm's idea is to 
construct the signature of each node in order to distinguish 
it from other nodes according to the fc-bisimilar relation. If 
we can properly fill in the pIdoid_tid column of Et, and join it 
with Nt on nid—sid, the information combined from columns 
{pIdo_„id, eLabel, pIdoidMd} is enough for constructing the 



signature. The column eLabel is already filled in before 
algorithm starts. The column pIdo_nid is filled in during 
iteration (line [5] to |3} . The column pldoUMd is filled in 
during each iteration j > fline I10|) . Then for each node 
ID uld £ Nt, we get its sigf.{uld), insert it to S in an I/O 
efficient way, getting pidkiuld) in return, and then placing 
this value in the pldnew_nid column of Nt . 

At line 1101 of Algorithm [T] to fill in the pIdoid_tid column 
of Et, we conduct a sort merge join of Et and Nt (since 
both tables are sorted properly in iteration 1), replacing the 
content of pld„id_tid in Et with pIdoid_nid in Nt. 
details of line [10] of Algorithm [T] 

Et<— TVa{Et t^tf, Nt) t> merge join of Et and Nt 

a : {Et.sid, Et.eLabel, Et.tid, Nt.pldoid.nid) 
(f> : Et.tid = Nt.nid 

At line [15] of Algorithm [T] we sequentially construct the 
signature sigf.{uld) for each uld £ T^nidiNt) according to 
Definition [3] and get the corresponding pld^{uld) (using 
S.insert()). All pldf.{uld) will be written back to the 
pldnem_nid columu of Nt (where nld=uld) right after, so that 
there is no random access to Nt. Note that although by 
definition sigk is a set, we construct sig^{uld) as a string, 
maintaining elements of the set in sorted order. It is both 
an easy way for storing a set and handy for implementing S 
later on (e.g., using a trie). 

details of line [15] of Algorithm [T] 

create string sig^(uld) pIdQ{uId) t> overall scan Nt 
if uld £ iTsid(F) then 

for each {uld, eLabel, pIdoid_tid) £ F do 

> sequentially scan F 
sig^{uld) sig^{uId) + {eLabel,pIdoid_tid) 

3.2 More discussions on Algorithm [1] 

Example run. If we assume the numbering scheme for S 
is a self-increased counter across iterations. Table [T] would 
be the intermediate results for running Algorithm [1] on the 
example graph in Figure [T] (fc = 2), and Table [2] gives the 
final output of the algorithm. 

Early stopping condition. It is not always necessary to 
let the algorithm run k iterations. Indeed, it can be shown 



Table 2: Output of Algorithm [T] on example graph in Figure [T] 



(fc = 2) 



(a) Nt 



nid 


nLabel 


Pld0_nld 


pIdoid_,iid 


pld„e„_„,d 


1 


M 


1 


3 


7 


2 


M 


1 


3 


8 


3 


P 


2 


4 


9 


4 


P 


2 


5 


10 


5 


P 


2 


4 


9 


6 


P 


2 


6 


11 



(b) E, 



sId 


eLabel 


tid 


pIdoi,(_tid 


3 


I 


1 


3 


1 


w 


2 


3 


2 


w 


2 


3 


5 


I 


2 


3 


4 


I 


3 


4 


1 


I 


4 


5 


2 


I 


6 


6 



(referring to Section rA.Sl in Appendix) that after a bounded 
number of computation iterations, Algorithm [1] would 
achieve the full (i.e., classical non-localized) bisimulation 
partition. We could detect this by simply checking the 
partition size each iteration produces. If two consecutive 
iterations produce the same number of partition blocks, 
this means that the algorithm already achieves the full 
bisimulation partition, and therefore it is safe to terminate 
the algorithm. 

Numbering schemes of partition identifier and S. In 

the algorithm, the correctness of the partition identifiers' 
assignment is guaranteed level by level, meaning that the 
partition block numbering scheme from iteration j has 
nothing to do with that of iteration ji + 1, for example. 
This means that we could use one counter for the whole 
computation, or could use different counters for each 
computation iteration. 

The same idea also applies for implementing S. As 
long as S returns distinct pids for different signatures for 
each computation iteration, it is immaterial to the work 
performed by Algorithm[T]if 5 is a new one for each iteration 
or not. So, we could use one S for all iterations (when we 
have a global counter), to reuse some signature pid across 
iterations. Furthermore, in practice there could potentially 
be benefits from warm caching (get a better hit ratio) for 
this approach. Moreover, for the maintenance algorithms 
presented in Section |4l we would only need to store one 
S instead of k of them. Essentially if the same signature 
appears many times in different iterations, we only save it 
once in S. The drawback of this method is that the size 
of S will keep increasing as the algorithm runs. This issue 
becomes acute when the number of partitions becomes large 
and the signatures are long, as we observed in some datasets 
presented in Section [5.31 

Data structures for S. As the reader may have noticed, 
the signature storage facility S plays an important role in 
Algorithm [1] In principle, any data structure that permits 
an efficient set-equality check will be sufficient. Trie and 
dictionary are such data structures, for instance. During 
our experiments, we see that in many of the cases, partition 
sizes are small and the signatures are short, for which a main 
memory based data structure is enough. In other cases, 
signature length could reach several million and partition 
size into tens of millions, then we need some external 
memory based solution for S. We could, for example, sort 
all signatures from F in an I/O efficient way [2|, then when 
scanning these signatures, partition identifiers are assigned. 
In this case, the overall cost of the S.insert() operation 
could still be bounded by 0{sort{\Et\)). Other disk based 
solutions, such as disk-based tries (e.g.. String B-Tree [9] or 
[12|) or inverted files (e.g., [17] ) could also be considered. 

In our experiments we use BerkeleyDB (B-Tree or Hash 
index) to mimic a trie, which, as we show in the experimental 
results, has acceptable empirical behavior. 



Complexity and correctness. We have the following char- 
acterization of Algorithm [1] 

Theorem 2. Let fc > and G = {N,E,Xn,)^e) be a 
graph. Algorithm\l\ computes the k-bisimulation partition of 
G with I/O complexity of 0{k ■ sort{\Et\) + k ■ scan{\Nt\) + 
sort{\Nt\)), and space complexity of 0{\Nt\ + \Et\). 

A proof can be found in Appendix lA.il 

Differences with previous work. Though inspired by 
paper jlSj, there are some major differences between it 
and our work. The main differences are twofold. (1) 
Targeting different problems. Paper [15] computes the full 
bisimulation partition for directed acyclic graphs, while we 
compute the localized bisimulation result regardless of the 
cyclicity of the graph. (2) Using different techniques. Based 
on nodes' ranks and time-forward processing, paper [T3] 
computes the result level by level, while our approach 
constructs signatures for all nodes at once for each iteration. 

4 Maintenance on the localized bisimulation 
partition result 

For maintenance algorithms we assume that we have con- 
structed the fc-bisimulation partition of graph G = (A'^, E, 
Ajv, Ab), where, as before, G's Nt and Et are stored on disk, 
containing the historical information kept in Nt (Table [3]); 
Et is the same as in Algorithm [1] but has two copies with 
sort orders {sld,tld) and {tld,sld) to boost performance. We 
use Etst and Etts to refer to each of these copies. 

Table 3: Nt for maintenance algorithms 



nld 


nLabel 


pldo_nid 


Pldl_nld 




Pldk_nld 















We further assume that we save the signature storage 
facility S on disk, which we use and update throughout the 
maintenance process. 

The maintenance problem includes the following subprob- 
lems. 

Change k. If k increases, we carry out another iteration 
of computation. If k decreases, we could return the result 
directly since we keep the history information in Nt- 

Algorithm 2 Add a new node to existing k- bisimulation 
partition 

1: procedure ADD_NODE(A''f , S, {uId,uLabel),k) 

t> uld is the new node ID, uLabel is its node label 
2: search for row {vid, vLabel, . . .) £ Nt where 

uLabel — vLabel > 0{search{\Nt\)) 

3: if could find row {vId, vLabel, . . . ) then 
4: use pIdo_nid of vId for pld^i^uld) 

5: else 

6: request a new pId, use it for pldQ^uId) 

7: get value of S.insert{pld^^{uld)), use it for 

pidi, . . . ,pldk of uld t> some constant I/O 
8; insert [uld, uLabel, pldg{uld), . . . , pld^.{uld)) to Nt 
9; return {Nt, S) 

Add a new node {uld, uLabel) fADD_NoDE()j. When 
adding a new node {uld, uLabel) {uld is the new node's 
ID, uLabel is its node label, resp.), we assume the node is 



Algorithm 3 Add a set of new nodes to existing k-bisimulation partition 



1: procedure ADD_NODES(Ai't, S, newNodes, k) t> new Nodes is a table of new nodes 

2: Nt ^ sort{Nt) hy nLabel > 0{sort{\Nt\)) 

3: newNodes sort(newNodes) by nLabel > 0{sort(\Nt\)) 

4: newNodes -f— na{newNodesixi^ (Nt)), remove duplicates t> 0{scan{\Nt\)) 

a : {newNodes.nId,newNodes.nLabel,Nt.pIdn_nid, • ■ • ) 

(j) : new Nodes. nLabel = Nt .nLabel 

5: request a new pid for each new nLabel in newNodes, fill in all the NULL fields in newNodes.pIdo_nid 

6: for each uld £ Ttnidinew Nodes) do > overall 0(scan(|A'^t|)) + cost of S 
7: get value of S.insert{pIdQ{uId)), use it for pldi_nid, • • • ,pldk_nid of u/d 
8: append newNodes to A''t 
9: return (Nt, S) 



Algorithm 4 Add a new edge 
1: procedure ADD_EDGE(Afi, Etst, Etts, S, {s,l,t),k) t> {s,l,t) is the new edge to existing k-bisimulation partition 



2: if A; = then 

3: insert (s,l,t) to Etst > ©(searc/idiStl)) 

4: else > A: > 

5: A^t^ sort(A^t) by nid l> 0(sort(|Art|)) 

6: create empty priority queue pQueue t> overall 0{sort{\Nt\)) 

7: for j G {l,...,fc} do 

8: enqueue {j, s) to pQueue 

9: insert {s,l,t,pIdo{t)) to Etst and i?tts > 0{search{\Et\)) 

10: while pQueue is not empty do 

11: dequeue all pairs {j,uld) from pQueue with the same (i.e., smallest) j value, save all distinct uld to Ad 

12: F ^ OsideMiEtst) t> merge join, 0(scan(|Aft|) + smndStD) 

13: fill in the pIdoid_tid column of F > 0(scan{\Nt\) + ©(sortdFj |)) + 0{scan{\Et\))) 

14: // <— Ha{F), where a = {sId,eLabel,pIdoid_tid) 

15: H sort on sId,eLabel,pIdoid_tid, and remove duplicates > 0{sort{\Et\)) 

16: for all uJd G Af do > scan M, Nt and i/, overall 0{scan{\Nt\)) + 0(scan(|_Et |)) + cost of S 

17: construct sigj{uld) from iif 

18: pld-{uld) <— S.insert{sig j{uld)) 

19: if pldj{uld) is not the same as the corresponding value in Nt.pldj_nid then 

20: propagate changes to Nt and pQueue t> 0{scan{\Nt\)) + 0{scan{\Et\)) 
21: return (At, Etst, Etts, S) 



isolated. In this case, we will not modify Et, but will insert 
one row {uId,uLabel, pldgluld), pld,^{uld)) to Nt. For 
pld^i^uld), we search for some row {vId,vLabel, pIdQ{vId), 
pldi,{vld)) £ Nt such that vLabel — uLabel, then we 
assign pIdQ{vId) to pld^i^uld). If we cannot find such vid, 
we request a new pId and use it for pldg{uld). For fc > 0, 
since sigj{uld) {j € {1, . . . , k}) is always {pIdQ{uId), 0), we 
use the value of S.insert{pIdo{uId)) for pldj{u). Since 
the cost of inserting one such signature to S requires a 
constant number of I/Os, the algorithm's cost is bounded 
by 0{search{\Nt\)) . Pseudo code is in Algorithm[2] 

Add a set of new nodes fADD_NoDEs()j. If we want to 
add many nodes at once, we could again make use of 
sort and scan instead of random access (e.g. search) on 
tables. Here we assume we have the new nodes stored in 
the newNodes table, which has the same schema as Nt , and 
that \newNodes\ = 0{\Nt\). For adding a set of new nodes, 
the idea is the same as adding one. But we will first sort 
Nt and newNodes by nLabel, then perform a merge join on 
the nLabel column to fill in the pIdo_nid column of newNodes 
for all the existing nLabel. For the missing ones, we request 
a new pId for each of the new nLabel. Then we get the 
pidi , . . . , pidk of the newNodes by inserting its pi do to S. 
At the end we append the whole newNodes to Nt. The I/O 
complexity of Add_Nodes() is bounded by 0(sort{\Nt\)). 



Pseudo code is in Algorithm [S] 

Add a new edge {s,l,t) ( Add_Edge() ). When adding one 
edge to the graph (here s, I, t are source node ID, edge label, 
and target node ID, resp.), we assume that we add the edge 
between two existing nodes. If this is not the case, we call 
procedure Add_Node() or Add_Nodes() first. 

The potential changes caused by the edge insertion are 
to sigj{s) (1 < J < k), as well as those signatures of 
all ancestors of s within k steps. So the main work is to 
detect whether there is some change in sig^ (s) and propagate 
those change(s) to its parent nodes' signatures in later 
iterations. We use a priority queue pQueue to record and 
process such changes in a systematic, level- wise manner. For 
some node ID uld and iteration j, pQueue stores the pair 
{j, uld) as priority reference. Then whenever we dequeue 
one element from pQueue, we get the smallest node ID from 
the lowest iteration (lowest priority reference). Therefore 
pQueue indicates those nodes whose signatures could change 
in each iteration level (from 1 up to k). 

At the beginning of the algorithm, we enqueue {j, s) to 
pQueue {0 < j < k). Then, while pQueue is not empty, we 
dequeue the list of {j, uld) pairs with the same j out of the 
queue, construct the new signature of each such uld, insert 
it to S, and compare the returning pld-{uld) with the old 
pidj nid value of uld. If the pId remains the same as the old 



one, we continue; if it changes, we record pldj{uld) in Nt, 
and enqueue all {j + l,vld) pairs to pQueue where vid £ 
T^sidit^tid^uidiEt))- Pseudo code is given in Algorithm HI 
and a detailed discussion is in Section [4. II 



Add a set of new edges ( Add_Edges() ). For adding a set 
of edges, we also assume that the edges are added between 
existing nodes. If this is not the case, we first call procedure 
Add_Nodes(). Then we change hue [8] of Algorithm |4] to 
"enqueue a set of {j, s) pairs to pQueue". This means for 
every iteration, we have the possibility to modify a set of 
nodes' signatures instead of one. For further computation, 
however, the procedure is exactly the same. Therefore the 
I/O complexity remains the same as that of Algorithm |4l 

Deletions. Deletions follow a similar idea to insertions. 
When removing an edge (s, I, t), it is the same idea as adding 
one edge. We also (potentially) modify the signature of 
s, propagating changes to its ancestors via pQueue, then 
the reasoning is the same. When removing a node, we first 
remove each incoming edge and each outgoing edge for that 
node. Then we remove the node from the node table. 

4.1 Details of Algorithm|4](ADD_EDGE()) 

4.1.1 Input and output 

The input variables of Algorithm |4] are node table Nt , edge 
tables Etst and Etts, the signature storage facility S, the 
new edge {s,l,t) and k. The output variables of Algorithm 
|4]are Nt, Etst, Etts and S. Nt's schema is given in Table O 
while Etst and Etts^s schema is the same as in Algorithm[T] 

4.1.2 k = 0, line^to^of Algorithm^ 

For A; = 0, since all nodes' information is properly filled 
(including the pIdo_nid column) in Nt we only need to add 
a new row (s, /, t) to Etst- 

4.1.3 k> 0, line |?1 to \20\of Alsorithm \4\ 

For fc > 0, for each iteration, which is indicated by j in the 
algorithm, we need to (1) find out the potential nodes whose 
signatures could have changed; (2) check whether these 
signatures have been changed or not; and, (3) propagate any 
such changes to the parents of these nodes. To record the 
potential nodes and to perform the propagation, we use a 
priority queue pQueue. pQueue takes {j, nid) as the element 
and priority reference, where j is the iteration level and nId 
is the node identifier. To check signature changes, we reuse 
the signature storage facility S. 

When adding a new edge {s,l,t) to the graph, all sig^{s) 
{j > 0) have the potential to change, and hence we add all 
pairs (j, s), for j £ {1, . . . , k}, to pQueue, indicating that we 
need to check the signature of s in every iteration (line [7] to 
|8}. For each iteration j > 0, we dequeue from pQueue all 
node IDs in the smallest iteration j, remove duplicates, and 
save them to a temporary table M, so that M contains in 
sorted order all node IDs whose signatures would change in 
iteration j. Then we create an extra table F, preparing for 
signature constructions. This is achieved by performing a 
merge join of Etst and M (where Etst-sld € M). Then we 
fill in F.pIdoid_tid column, as in Algorithm [1] 

details of line 1 131 of Algorithm [l] 

F ^ sort{F) by tid > 0{sort{\Et\)) 

F^Tv^{FM^Nt) > 0{scan{\Et\ + \Nt\)) 

a : [F.sld, F.eLahel, F.tid, Nt.pld(^j^i-f_nid) 
(j) : F.tid = Nt.nid 



After projection on the {sId,eLabel,pIdoid_tid) of F 
and removing duplicates, we get H, and are ready to 
construct the signatures. For each uld £ M, we construct 
sigj(uld) according to the signature definition. The idea of 
constructing the nodes' signatures is the same as line 1151 of 
Algorithm [Jl only in this case we are not considering every 
node but only those appearing in pQueue (and later in M). 

We then call S.insert{sig j{uld)) for all such uld. If 
S returns the same pld^(uld) as recorded in Nt.pldj_nid, 
nothing will happen; otherwise we change the Nt.pldj_nid 
entry of uld accordingly, and propagate the changes to 
pQueue. If j < k, we add all parents of uld to pQueue 
to indicate that we will check these nodes' signatures in the 
j + 1 iteration. This is achieved with the help of Etts- 

details of line [20] of Algorithm ID 

record the new pidj (uld) in the corresponding row in Nt ; 

t> overaU 0(scan{\Nt\)) 

if J < fc then 

H ^ atid=uid{Etts); > 0{search{\Et\)) 

for all {sId,eLabel,tId, pIdoid_tid) £ F[ do 

> overall 0{scan{\Et\)) 
enqueue {j + 1, sId) to pQueue; 

Complexity and correctness. We have the following char- 
acterization of Algorithm |4l 

Theorem 3. Let G = {N,E,\n,^e) be a graph and 
k > 0. After adding a new edge to G, Algorithm [J] 
correctly updates the k-bisimulation partition of G with I/O 
complexity of 0{k ■ sort{\Et\) + fc ■ sort{\Nt\)) , and space 
complexity of 0{k • |A''t| + fc ■ \Et\). 



A proof can be found in Appendix [AT] 

4.2 More discussions on Algorithm 3] 

Example run. We present different behaviors of Algorithm 
[1] using two examples. Here we will extend the graph from 
Figure [T] as in Figure [2] The dashed lines in this figure 
indicate the two edges which we will add in our examples. 




p p p p 

Figure 2: Updates on the example graph 

First suppose we add edge (2, 1, 7) to the original graph of 
Figure [1] where node 7 is a new node with label P. Table 
[31 shows the resulting partition after this insertion. The 
new/changed part of the table is indicated in gray. When the 
algorithm starts, (2,1) and (2,2) are added to pQueue. Then 
after checking each of these, the algorithm finds no change 
in node 2's signature, therefore no change propagates, and 
the algorithm stops. We see that comparing with Table [T] 
the only thing that changes is to add one more row (node 
7) to the table. Since node 7 does not have outgoing edges, 
adding one edge that points into node 7 will not change any 
existing nodes's signature. Node 7 belongs to the group of 
node 6, and no other node changes group membership. 

In the second case, suppose we add edge (6,^,5) to the 
original graph of Figure[T] The algorithm first add (6,1) and 
(6,2) to pQueue. Then in iteration 1, the algorithm detects 



that the signature of node 6 does change, and therefore adds 
one new pair (2,2) to pQueue. In iteration 2, both node 2 and 
node 6's signatures are checked, and they are both changed. 
We see that in Table [5] p/d2(-') and pld2{2) become the 
same, while pld2{6) changes from 11 to 10. 

Table 4: 2-bisimulation for the example graph after edge insertion 
(2,«,7) 



nid 


pIdQ[nId) 


s?(/| [nld) 


pld^ [nld) 


sig2[nld) 


pld2{nld) 


1 


1 


l,{(»,l),(i,2)| 


3 


l,{(».3),(i,5)} 


7 


2 


1 


!.{(», !),(;, 2)} 


3 


l,{K3),(/,6)} 


8 


3 


2 


2,{(U)} 


4 


2,{(U)} 


9 


4 


2 


2,{('.2)} 


5 


2,{(;,4)} 


10 


5 


2 


2.{(U)} 


4 


2,{(U)} 


9 


6 


2 


2,0 


fi 


2,0 


11 


7 


2 


2,0 


6 


2,0 


11 



Table 5: 2-bisimulation for the example graph after edge insertion 
(6,Z,5) 



nld 


pld(^[nld) 


sig^{nld) 


pld-^{nld) 


sig2[nld) 


pld.2[nld) 


1 
2 


1 
1 


i,{Ki),(;,2)i 

!,{(», l),(/,2)} 


3 
3 


1,{K3),((,5)} 


7 


l,{K3),(/,5)} 


7 


3 


2 


2,{(U)} 


4 


2,{(i,3)} 


9 


4 


2 


2,{(i,2)} 


5 


2,{(M)} 


10 


5 


2 


2,{(U)} 


4 


2, {((,3)} 


9 


6 


2 


2.{(i,2)} 


5 


2,{(M)} 


10 



When to switch back to Algorithm\I} As we will see in our 
empirical study (Section 15. 5[) . it is not always beneficial to 
use Algorithm [31 since it performs some extra work for each 
iteration. Some heuristics could be adopted by the algorithm 
to let it decide when to switch back to Algorithm [T] For 
example, if at a certain iteration, most of the nodes are 
putted into pQueue, it is more beneficial to switch back to 
Algorithm!!] This could be done by simply checking the size 
of pQueue at the beginning of each iteration. 

5 Empirical Analysis 

In this section we present the results of an in-depth 
experimental study of our algorithms. After introducing 
our set-up, we first discuss a validation of the correctness 
of our algorithms by several experiments. We then show 
the performance of the algorithms on both synthetic and 
real datasets. In these experiments, various aspects of the 
algorithms are investigated while other settings are fixed. 

5.1 Experiment setting 

The following experiments are run on a machine with 
2.27 GHz Intel Xeon (L5520, 8192KB cache) processor, 
12GB main memory, running Fedora 14 (64-bit) Linux. We 
use C-|— |- to implement all the algorithms, using GCC 4.4.4 
as the compiler. We use the open-source STXXL library [5] 
to construct the tables and perform the external memory 
sorting, and use Berkeley DB to implement S. One S is used 
for all computation iterations (as discussed in Section [3.2^ . 
In the experiments we do not exploit any parallelism and 
restrain ourselves with predefined buffer sizes. We record the 
running time as well as the I/O volume between the buffer 
and the disk system. Therefore, the performance (time) of 
the experiments are comparable to a commodity PC, and 
the I/O volume can be repeated on other systems. In the 
following experiments, we set both the STXXL buffer and 
Berkeley DB buffer to be 128MB, if not otherwise indicated. 
Please note that we run experiments for the Twitter dataset 
on a different machine (Intel Xeon E5520, 2.27 GHz, 8192KB 
cache, 70G main memory, same OS) for limited disk space 
reason, using a 512MB buffer setting. 



Datasets. To prove the practicability of the algorithms, we 
experiment with various graph datasets. The datasets are 
collected from public repositories, ranging from synthetic 
data to real-world data, from several million of edges to 
more than 1.4 billion edges. In Table[B]we give a description 
of the datasets, as well as some simple statistics of them. 
All datasets are accessed on 15 May 2012. Note that due to 
space limitation, in the following we show the experiment 
results on a subset of the datasets when the result is 
representative enough. 

Table 6: Description and statistics of the experiment datasets 



Data Name 


Deseription 


Node Count 


Edge Count 


Label on 


Jamcndo 


A repository of music 
metadata in RDF for- 
mat^ 


486,320 


1,049,647 


Edge 


LinkcdMDD 


A repository of movie 
metadata in RDF for- 
mat^ 


2,330,695 


6,147,996 


Edge 


DBLP 


An RDF format 
DBLP dump^ 


23,000,670 


50,203,406 


Edge 


WikiLinks 


A pagc-to-page 
linking grapli of 
Wikipedia 


5,710,993 


130,160,392 


None 


DBPcdia 


An early RDF dump 
of DBPcdia' 


38,615,135 


115,305,444 


Edge 


Twitter 


A following relation- 
ship graph of Twitter*^ 


41,652,230 


1,468,365,182 


None 


SP2B 


A RDF data generator 
for arbitrarily large 
DBLP-like data^ 


280,908,393 


500,000,912 


Edge 


BSBM 


A RDF data gener- 
ator for e-commerce 
use case^ 


8,886,078 


34,872,182 


Edge 



5.2 Validation of implementations 

We validate the correctness of the implementation of our 
algorithms by comparing experiment results against other 
existing solutions. The first algorithm is the classic 
bisimulation algorithm from paper [24) , which computes the 
full bisimulation of the graph. We implement this algorithm 
using python, run it on small datasets from the Stanford 
Large Network Dataset Collection^ (p2p-Gnutella04, 05, 06, 
08), and compare the output with Algorithm [T] while setting 
k to 100. The two algorithms produce the same result. 

We also validate Algorithm [T] against paper jl5^. Since we 
could handle any kind of directed graph, we should be able 
to handle acyclic graph. We use the random DAG generator 
provided along with [ISj to generate several graphs and test 
them on both algorithms. They produce the same partition 
results as expected. 

Furthermore, we validate the algorithm Add_Edge() 
against the algorithm Build_Bisim(). In this experiment, 
for the same dataset, we first compute the fc-bisimulation 
partitions using Build_Bisim(); then split the dataset into 
two parts, using the first part as the building block, and 
the second part as the edges to be updated, applying 
Add_Edge() many times on the second part. Both 
algorithms produce the same results. 



^http : / / dbtune . org/ j amende /j 
^http: //www . linkedmdb . org/ 
"http: //thedatahub . org/dataset/lSs-dblp] 
^http: //haselgrove . id. au/wikipedia.htm 
http: //www. cs .vu.nl/~pmika/ swc/btc . html | 



http://an.kai st . ac . kr/traces/WWW2010.html^ 

'http: //dbis . inf ormatik .uni- freiburg.de/ index . php?proj ect=SP2B | 



http : //www4 . wiwiss . f u-berlin . de/bizer/BerlinSPARQL Benchmeirk/ 1 
" http: //snap . Stanford. edu/ data/ inde x . html 
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Figure 3: Experiment results for Algorithm ^ for real and synthetic datasets (k 

5.3 Experiments on the localized bisimulation 
construction algorithm (build_Bisim()) 



10) 



In Figure [3] we show the experiment results for Algorithm 
[T] on all datasets. We compute the 10-bisimulation (i.e., 
k — 10) of these datasets, and measure many aspects of 
the running behavior for each iteration. Concerning time 
measurement, we run every experiment 5 times and take 
the average number. S uses BerkleyDB's B-Tree index in 
this experiment. Readers can find detailed numbers from 
these experiments in Table [T] found in the Appendix. 

In Figure l3al we show the number of partition blocks every 
iteration produces for all datasets. We see that the numbers 
vary from one dataset to another, where the difference is 
sometimes more than 10 times, and interestingly, does not 
directly relate to the size of the dataset. In certain cases 
(e.g.. Twitter) partition size is quite large. Moreover, many 
of the datasets (e.g., Jamendo, LinkedMDB, DBLP, etc.) 
reach full bisimulation after 5 iterations. In fact, all datasets 
(including Twitter) get sufficient partition result after 5 
iterations of computation. Here we can reasonably argue 
that even for Twitter dataset, the partition results after 
5 iterations are too refined (i.e., (partition count)/(node 
count) > 0.8). 

Figure I3bl shows the maximum length of signatures for 
each iteration. We observe that the signature length is 
usually quite short, especially comparing with the size of 
the graph. But there are still cases (e.g.. Twitter) that the 
signature becomes very long (more than 1 million integers), 
which stresses the need for an I/O efficient solution for S. 
Note that the synthetic datasets, such as BSBM and SP2B, 
reach their full bisimulation partition after 3 iterations of 
computations, and have rather short signatures, indicating 



that they are highly structured. 

Figures |3c] and I3dl show the I/O volume spent on sort- 
ing/scanning (STXXL) and on interacting with S (Berkeley 
DB). We see for most of the datasets, there is no dramatic 
change cross different iterations. But for Wikilinks and 
Twitter, the two datasets which have very few partition 
blocks at the beginning and many at the end, there is a 
big difference on S for different iterations. In this case I/O 
on S becomes a comparable factor with sort and scan (I/O 
on STXXL). 

Figure l3el shows the time spent on preparing the signature 
(line [5] to [13] in Algorithm [1} for each iteration, which is 
quite stable for all datasets. Figure |3f| shows the time on 
constructing the signature and insert into S (line ll4l to ll7l in 
Algorithm [!}. In this case datasets with higher degrees tend 
to cost more time in later iterations, which correlate with 
their longer signatures and larger number of partition blocks. 
For all datasets, however, the operations on constructing 
and looking for signature are the dominant factor for each 
iteration. This brings us to think about further optimization 
tasks on construction of signature and implementation of S. 

We can conclude that the algorithm is practical to use. It 
can process a graph with 100 million edges (e.g., WikiLinks 
and DBPedia) in under 700 seconds for one iteration, and 
performance scales (almost) linearly with the number of 
nodes and edges. 

5.3.1 Different implementations of S 



As we mentioned in Section 13.21 S could be implemented 
in several ways. In Figure U we compare the overall 
I/O performance of Build_Bisim() using B-Tree and Hash 
indexes for S on several datasets. We see that the B- 
Tree implementation slightly outperforms Hash Index for all 



datasets. This is most likely due to small caching effects and 
locality of references during construction of the signatures. 
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Figure 4; I/O comparison for B-Tree and Hasli index of 5 (fc 
10) 



5. 3. 2 The effect of different buffer sizes 

We allocate two buffers, one for scan and sort (STXXL buffer 
in our case), one for S (BerkeleyDB buffer in our case), in 
order to analysis the impact of buffer size to our algorithms. 
We take the DBPedia dataset for example, since it is large 
enough to show the buffer effects. For the sort /scan setting, 
we set the buffer size ranging from 16MB to 512MB, while 
keeping the S buffer to 128MB, recording the I/O between 
the buffer and the disk system. From Figure [5a| we see that 
bigger buffer does improve the performance. But since we 
only gain in the external memory sorting part, a certain 
amount of I/Os is inevitable for each iteration. Note that 
the reason why iteration 1 has higher I/O cost is that in 
iteration 1 extra sorts on A'^t and Et are performed. 
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Figure 5: I/O for different buffer size setting for sort/scan and S 
{k = 10) 

For the setting on S, we set the buffer size ranging from 
16MB to 512MB, while keeping the sort/scan buffer to be 
128MB, recording the I/O of the buffer to the disk system. 
From Figure [5bl we also see that more buffer brings less 1/0, 
as expected. However, in this case the buffer size change 
has a bigger impact on the I/O performance. This indicates 
that if we have a certain amount of memory space, it is more 
beneficial to allocate more memory to the S buffer than to 
the sort/scan buffer. Note that S buffer also shows quite 
high hit ratio during execution (more than 0.98 for DBPedia 
in all settings). 

5.3.3 Scalability 

In order to measure how well the algorithm scales, we 
generate different size of SP2B datasets (edge count IM, 
5M, lOM, 50M, lOOM, 500M), and measure the I/O and 
elapsed time for each dataset. In Figure |B] we see that the 
time spent on each edge is on the order of 10~^ seconds. 



and the I/O spent on each edge is under 4000 bytes (which 
is one typical disk page size). The algorithm's performance 
scales (almost) linearly with the data size. 
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Figure 6: Time and I/O spent on each edge on average (fc = 10) 

5.4 Experiments on single edge update 
algorithm (Add_edge()) 

Edge updates are common operations for graph data. For 
our datasets, adding one edge means to add a link between 
two wiki pages (WikiLinks), to add more information to one 
publication or author (DBLP), to follow one more person 
(Twitter) and so on. Sometimes we would like to also add 
several edges together at once. So in this subsection we test 
the performance of Algorithmic (Add_Edge()), and in the 
next subsection the batch update version (Add_Edges()). 

To create the dataset for testing, we randomly take one 
edge from the edge set, perform Build_Bisim() on the rest 
of the dataset, and apply Add_Edge() on this edge. We 
believe the edge selection is more natural this way, since it 
take into account the distribution of edges among nodes. 
We repeat the experiment 10 times and take the average of 
the measured numbers. In Figure [7a] we show how many 
nodes are checked for adding one edge to the graph in each 
iteration. In Figure I7bl we show how many nodes actually 
change their partition IDs in each iteration. From the figures 
we see that the behavior varies for different datasets; graphs 
that have larger degrees tend to propagate more changes to 
later iterations, which complies with our intuition. 

Since there is a chance that many nodes are changed but 
they may all belong to a certain set of partitions, we also 
show how many partitions change their members in each 
iteration (Figure [7c)) . We see that the behavior of Figure [Tel 
is closely related to that of Figure [7bl 

5.4.1 Comparison of Build_Bisim() and Add_Edge() 

After edge insertion, if there is no update algorithm 
available, the only choice to get the fc-bisimulation partition 
is to execute the Build_Bisim() from scratch on the new 
dataset. So this would be the baseline for the Add_Edge() 
algorithm to compare. In the following we compare the 
overall I/O and time (Figure ^ of the two algorithms. 
We see that indeed the Add_Edge() algorithm always 
achieves a better performance than using Build_Bisim() to 
recompute the fc-bisimulation partition result from scratch, 
with up to an order of magnitude improvement. 

5.4.2 Comparison of Build_Bisim() and Add_Edge() 
in extreme cases 

From the above experiments, we see that the performance 
of the algorithms are highly related to the datasets they 
process. For some datasets, the update algorithm is very 
much favorable while in other cases not so much. In the 
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Figure 8: I/O and time comparison for Build_Bisim() and 
Add_Edge() after inserting one edge to the dataset {k = 10) 

following, we would like to gain a better understanding of 
this phenomena. 

We achieve this with two synthetic datasets, triggering 
both the extreme cases where the construction algorithm 
benefits the most and the update algorithm benefits the 
most. Dataset one (named Dbest) shows a best-case 
scenario that the update algorithm can achieve relative to 
the construction algorithm. In this case we create a full k- 
ary tree, with edges pointing from parents to their children. 
When adding one edge to the tree, we add one edge to the 
leaf node, so that no node's signature would change after the 
insertion. In this case the update algorithm does the least 
amount of work, without propagating any change to further 
iterations during execution. Figure [9a] shows an example of 
Dbest, which is a binary tree with height 3. The dashed 
edge is the newly added edge. 

Dataset two (named as Dworst) exhibits a worst-case sce- 
nario for the update algorithm, relative to the construction 
algorithm. In this case we create a complete graph, with 



(a) (b) 
Figure 9: Examples for Dbest I l9al l and Dworst l l9bl l datasets 

edges all labeled with x. Then when adding one more edge 
(labeled y) to one of the nodes, every other node in each 
iteration is affected and therefore all the nodes' signatures 
are changed. The update algorithm has to check all nodes 
in every iteration. Figure [9b] shows an example of Dworst, a 
complete graph with 5 nodes. The dashed edge is the newly 
added edge. 




Figure 10: Time and I/O comparison for Dbest and Dworst 
by applying Build_Bisim() and Add_Edgb() algorithms on both 
(k = 10) 

We generate Dbest and Dworst on the scale of 100 million 
edges, and measure the elapsed time and I/O costs (Figure 
[TOjl for both the construction (Build_Bisim()) and edge 
update (Add_Edge()) algorithms in each iteration. We see 
that indeed for Dbest, the update algorithm shows a 4 times 
speed-up in time compared with the construction algorithm. 
For Dworst, the update algorithm is 2 times slower in time 
than the construction algorithm. 

5.5 Experiments on multiple edges update 
algorithm (Add_Edges()) 

For the Add_Edges() algorithm, we randomly select a 
set of edges from the dataset (edge count 10,100,. . . ,1M), 
and apply the algorithm upon it, comparing it with the 
Add_Edge() algorithm. In Figure [TT] we show the 
overall I/O and time for each of these cases (taking the 
average). The data points where x — lO" are the ones 
for the Add_Edge() algorithm. We also use horizontal 
lines of the same shapes to show the performance of the 
Build_Bisim() algorithm, to indicate at which point it is 
more beneficial to switch from one to another. From the 
figures we see that for smaller datasets, it is beneficial to 
do batch update (Add_Edges()) in most of the cases. For 
larger datasets, however, changes propagate rapidly in the 
first few iterations, therefore the construction algorithm 
(Build_Bisim()) becomes a better choice when there are 
more than ten edges to be updated. 

6 Conclusion and future work 

In this paper we have presented, to our knowledge, the first 
I/O efficient general-purpose algorithms for constructing 
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Figure 11: Time and I/O comparison for updating different 
number of edges, using Add_Edge() for one edge and 
Add_Edges() for more edges (fc = 10) 

and maintaining localized bisimulation partitions on massive 
disk-resident graphs. A theoretical analysis showed, and an 
extensive empirical study confirmed, that our algorithms are 
not only efficient and practical to use, but also scale well with 
the size of the data. 

We close by listing a few promising research directions 
for further study. First, we could more deeply investi- 
gate the properties and performance of our solutions on 
different datasets. Second, it would be very interesting 
to explore adaptations and extensions of our algorithms 
for new hardware platforms (e.g., multicore, SSD). Third, 
as we mentioned at various points, many alternative data 
structures and join algorithms can be investigated for 
optimizing various aspects of the proposed algorithms. 
Fourth, many aspects of our algorithms naturally lend 
themselves to parallel or distributed solutions; this is 
certainly an interesting direction for further study. Last but 
not least, the novel ideas developed in this paper provide a 
basis for investigating related problems such as computing 
and maintaining simulation partition in external memory 
(e.g.. El). 
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APPENDIX 
A Proofs 

A.l Proofs for the body 

Proposition 4. uKi'' v ^ u fs''"^ v (fc > 0). 

Proof for Proposition [D By induction on k. 

(1) k = 1. This is obvious, as 0-bisimiIarity just enforces 
equality of node labels. 

(2) k> 1. Assume that this holds for j - 1 {^^^'^^^^'^ 
,0 < j — 1 < fc), we want to show that this also holds 
for j (w fa-* u => It Ki^^^ v). Let u v. According 
to the definition, for all outgoing edges (u, u') G E, there 
exists some edge {v,v') € E, such that it' Ri-*"^ u' and 
Xe{u,u') = Xe{v,v'), and vice versa. Since 

we have u' then we have u v. So u v ^ 

u u. □ 

Proof for Proposition [T] =>: 

(1) For = 0, this is trivial, since pIdQ{u) — pld^lv). 

(2) For A; > 0, (which also means u w), we want to 
show that sig^{u) — sigf.{v). According to Proposition 2] 
u f M fti*^ so that pldQ(u) — pldg{v). And for 
each outgoing edge {u,u') of u, there exists some outgoing 
edge (v,v') of v, such that it' Ri*^"^ u', then pld^._i{u') = 
pldj._^{v' ), and Xe{u,u') = A_b(i',i'')- Therefore each pair 
in sig^{u) equals to some pair in sigj.{v), and vice versa. 
Then we have sig^.{u) — sigi^{v). 

<=: 

(1) For k — 0, this is obvious. 

(2) For A; > 0. Let sigj.{u) — stgf.{v), we want to show 
that pld^.{u) = pld^.{v) (or u v). Since sig^{u) = 
sigf.{v), we know that for every outgoing edge {u,u') of it, 
we have a pair {\E{u,u),pIdf._-^{u')) in sigf,{u), we can 
find an equal pair (A_e(i;, w'), (n')) in sigf.{v), such 
that pldj,_^{u') — pldf._i{v') and A_B(it,it') = A_e(i;,i;'). 
By definition, this means u v. Then we have pld,^{u) = 
pld^{v). □ 

Proof for Theorem [2] After all the I/O cost of one 
iteration of k-hisvmulation computation is bounded by 
0{aort{\Et\)) + 0{scan{\Nt\)) , fc is a given input, and there 
is one extra sort on Nt in iteration 1. Hence Algorithm [1] 
has the I/O complexity of 0(fc ■ sort{\Et\) + fc ■ scan{\Nt\) + 
sort{\Nt\)). 

During computation, only one A'^t and Et are used, and 
5* is used. The space upper bound for S is the same as the 
space upper bound for all signatures. Since in the algorithm, 
we construct all signatures by joining the information from 
Nt and F (which is a projection of Et), the space upper 
bound of S is 0(|A'^f | + |iJt|). Therefore, the overall space 
complexity upper bound of Algorithm [T] is 0(|7Vt| + |i5t|). 

We prove correctness inductively. 

(1) fc = 0. Since we are following the definition, this is 
obvious. 

(2) fe > 0. Assume we get the correct (fc — 1) btsimulation 
partitioning results. In iteration fc, for each node it in A^t, 
we construct stgi^{u) and insert it in S to get pld^.{u). 
According to Proposition [T] and the definition of S, we are 
sure that pldf.{u) is correct. □ 

Proof for Theorem [3] After all the I/O cost of one 
iteration of Algorithm |4] is bounded by 0{sort{\Et\)) + 
0{sort{\Nt\)), and the upper bound of the number of 



iterations is fc. Hence Algorithm |4] has the given I/O 
complexity. 

During computation, only one Nt and Et are used, and S 
is used. Here the node table contains historical information 
from iteration to k, so comparing with the original A'^t, 
the space upper bound is 0{k ■ \Nt\). Also according to the 
algorithm, every iteration would have to save its signature 
mapping to S, so the space upper bound of S is 0{k ■ \Et\). 
Therefore, the overall space complexity upper bound of 
AlgorithmHis 0(fc • lA^tl + fc ■ l^tl). 

Let (s, I, t) be the new edge. After we insert s, t to A^ , 
pldf^{u) will not change for any u £ N. So, according to 
Definition [3l there are only two ways that sigj{u) (0 < j < 
fc) could be affected: 

(1) a new pair {Xe{v), pld^_j^{v)) appears, or 

(2) changes of pldj_i{v) in some existing pair (A_b(i;), 
pld^_-^{v)), where v is some child of it. 

Case (1) can only be caused by adding a new edge to u, so 
that in our case this can only happen to sigj{s) (0 < j < fc), 
and we capture these changes in line [S] of Algorithm |31 
The second case can only happen when the pldj-i for the 
children of it changes. We capture (and propagate) these 
changes in line [20] of Algorithm [l] Therefore, we capture 
all changes in the signatures of u £ N, and recompute the 
signatures accordingly. Hence Algorithm [4] produces the 
correct k-bisimulation partitioning result. □ 

A.2 Alternative definitions for localized bisim- 
ulation 

In this section, we show the equivalence of various definitions 
of localized bisimulation that are studied in the literature. 
We have an alternative definition for k-bisimilar (13.il6l[2T] : 

Definition 4. Let fc > and G — {N,E,\n,Xe) be 
a graph. Nodes u,v £ N are called Kaushik fc-bisimilar 
(denoted as u v), iff the following holds: 

1. if k = 0, then Ajv(ii) — Xn{v). 

2. if k > 0, then: 
(aj u ~ V 

(b) W G A[(ii,ti') e E ^ 3v' e N[{v,v') G E, 
u'&.^~'^v' and Xe(u,u') = Xe{v,v')]], and 

(c) W G A[(i;,i;') € E ^ 3u' £ N[{u,u') £ E, 
v'&^~^u' and Xe(v,v') = A_B(it, it')]] . 

Proposition 5. u v iff u fa* v. 

Proof. We first want to prove that ^''^ . We will 
do it inductively. 

(1) fc — 0. This is obvious, according to the definitions. 

(2) fc > 0. Assume that for nodes u,v £ N , u v ^ 
It II (0 < J < fc), we want to show it x^^^ i; it A-'^^ v. 
We only need to show that it i; it ~-' d. Since this 
holds, and u v ^ u v , u v u v. Then 
according to Definition [4] we are done. 

We then prove that r;*^ ^^r;*^. We will do it inductively. 

(1) fc — 0. This is obvious, according to the definitions. 

(2) fc > 0. Assume that for nodes u,v £ N, u u 

It w (0 < j < fc), we want to show u A-''^^ v => u r;-'"'"^ v. 
We only need to show that Ajv(«) = Aiv(i;). From it v, 
we know that it r;-* v, therefore u Ri-* v. So Ajv(it) ~ Ajv(i'). 
Proof done. □ 



We also have another alternative definition for k-btsimilar 

Definition 5. Let G = {N,E,Xn,Xe) be a graph. Let 
X — {Ii, . . . , In}, n > 0, be a set of subsets of N. X is said 
to partition G ( or, to be a partition of G) if its elements 
are pairwise disjoint and N — U/gi^- Partition X is said 
to refine partition J (or, is a refinement of J ) if for every 
/ £ I there exists a J £ J' such that I (Z J. 

Definition 6. Let G = {N, E, Xm, )^e) be a graph and 
X and J be two partitions of G. X is said to be stable 
with respect to J if for any I £ X, J J , and edge 
label i, it holds that either I C parentsi{J) or I D 
parentse{J) = (where parentse{J) — {y \ 3x G J{{y,x) G 
E and edgeLabel{y,x) — I)}). 

Definition 7. Let k > and G = {N,E,Xn,Xe} be a 
graph. The k-partition of G is defined inductively as follows: 

1. if k = 0, then the k-partition of G is the set formed by 
partitioning N by node labels. 

2. if k > 0, then the k-partition of G is the smallest (i.e., 
least cardinality) partition X of G such that there exists 
a (k — l)-partition J of G such that X is a refinement 
of J and is stable with respect to J . 

Definition 8. Let k > and G = {N,E,Xn,Xe) be a 
graph. Nodes u,v £ N are called Paige- Tarjan fc-bisimilar 
(denoted as u ~k ri), iff there exists an element B in the 
k-partition of G such that u£ B and v £ B. 

Proposition 6. u v iff u fa* v. 

Proof. We first want to prove that Ri^^^''. We will do 
it inductively. 

(1) k = 0. This is obvious, since nodes are partitioned by 
node labels. 

(2) fe > 0. Assume that for nodes u,v £ N, u ~j v ^ 

u V {0 < j < k), we want to show u v ^ u Ki^^^ v. 

We only need to prove point 2 and 3 of Definition [T] (~'°). 

From u ^j+i v, we know that Vu' £ N[{u,u') £ E ^ 

3v' £ N[{v,v') £ E, u Sij v' and Xe{u,u') = Xe{v,v')]], 

and Vv' £ N[{v,v') £ E ^ 3u' £ N[{u,u') £ E, v' 

u' and Xe{v,v') = Xe{u,u')]]. Since K^j^^^ , we have 
u V. 

We then prove that ~ We will do it inductively. 

(1) A: = 0. This is obvious, since nodes are partitioned by 
node labels. 

(2) fc > 0. Assume that for nodes u,v £ N, u k:^ v ^ 
u ~j u (0 < J < fe), we want to show u Ri-'^^ n u v. 
From u v, we know that £ N[{u,u') £ E ^ 
3v' £ N[{v,v') £ E, u' v' and A£;(u,it') = Xe{v,v')\], 
and Vv' £ N[[v,v') £ E ^ 3u' £ N\[u,u') £ E, v' 

u' and Xe(v,v') — Xe(u,u')]]. And since ^■'=>f«j, we know 
that all children of it, v who have the same edge label belong 
to the same partition. This fulfills the stable condition. 
From Proposition |4] we know that ^^^^ is a refinement of 
We then have u v. □ 



A.3 Partition splitting stop condition 

Proposition 7. If r^^^^^^^ , then (Vj'>j). 

Proof. Since Vit £ N, we could assign 

pldj{u) — pldj+i{u). Then, according to Definition [3] 
(signature) and Proposition (TJ it holds that pldj+2{u) — 
pldj+i{u), and the same applies for any further j' > j. □ 

Proposition 8. The j in Proposition always exists, 
and its upper bound is \N\ (number of nodes). 

Proof. From Proposition |4l we know that Vu, £ N, 
if u V, then u v, which is equivalent of saying 

partitions will either split or stay the same. If they stay for 
one time, they will stay forever (Proposition [Tjl . Otherwise, 
G has to at least split one of its partition blocks for each ~* 
where i < j , in which case j reach the upper bound |A'^|. □ 

From Proposition [8] we know that there is an upper 
bound for the number of iterations in Algorithm [1] If this 
upper bound is smaller than the user input k, algorithm 
can terminate earlier. Since the partition blocks will either 
split or remain the same, the number of partition blocks will 
either increase or remain. Therefore, by simply checking 
if two consecutive iterations produce the same number of 
partition blocks, we could decide whether the computation 
should stop. 

A.4 Connection between localized bisimula- 
tion and full bisimulation 

We observe the following useful connection between localized 
and full bisimulation. 

Definition 9. Let k > and G = {N, E, Xn, Xe) be a 
graph. Nodes u,v £ N are called bisimilar (denoted as u 
V ), iff the following holds: 

1. Xn{u) = Xn{v), 

2. \/u' £ N[{u,u') £ E ^ 3v' £ N[{v,v') £ E, u' ki 
v' and Xe{u,u') = Xe{v,v')]], and 

3. W £ N[{v,v') £ E ^ 3u' £ N[{u,u') £ E, v' ki 
u' and Xe{v,v') = A_b(ii, it')]] . 

Proposition 9. Let G = {N,E,Xn,Xe) be a graph. 
There exists a k > such that for any u.v £ N it holds 
that u ~fe 11 iff u V. 

Proof. First we want to show it u it ~ u. From 
Proposition [S] we know that fc has an upper bound |A'^|. 
Here we set fc to \N\, which means that ^k=~k+i- Then 
according to the definition, in iteration fc + 1, for it ~k+i v, 
we have: 

1. Ajv(it) = Ajv(i;), 

2. Vit' £ N[{u,u') £ E ^ 3v' £ iV[(i;,i;') £ E, u Ri* 
v' and Xe{u,u') = Xe{v,v')]], and 

3. W £ N[{v,v') £ E ^ 3u' £ N[{u,u') £ E, v' 
It' and Xe{v, v') = Xe{u, u')]]. 

Since Rik='^k+i, we can replace Rifc with ~fc+i, then the 
relationship R^k+i has the same definition as ~. So that 

Then we want to show that it~i;=>u«fci;. We will do 
it inductively. 

1. fc = 0. This is obvious. 



2. fc > 0. Assume that this holds for j — 1, wo want to 
show that this also holds for j. Let u ~ v, we want 
to show that u ~j v. According to the definition, we 
want to have for all outgoing edges (w, u') £ E, there 
exists some edge {v,v') € E, such that u' v' 
and \e{u,u') = \e{v,v'), and vice versa. Because 



of M « II, we already have u' ~ v'; and because of 
u ^ V => u V, we have u' v' . Then all the 

requirements for u v are fulfilled. So «=j-?afc. 

□ 



Table 7: Experiment results of Build_Bisim() for real and synthetic datasets 



Data Set 


Measurement 


Iteration 1 


Iteration 2 


Iteration 3 


Iteration 4 


Iteration 5 


Iteration 6 


Iteration 7 


Iteration 8 


Iteration 9 


Iteration 10 




Partition Count 


43 


199 


297 


310 


310 


310 


310 


310 


310 


310 




Preparation Time (s) 


0.88 


0.63 


0.68 


0.69 


0.68 


0.65 


0.69 


0.66 


0.65 


0.65 




Constructing Time (s) 


1.78 


2.05 


2.16 


2.17 


2.18 


2.17 


2.14 


2.38 


2.42 


2.40 


Jamendo 


Table Read (byte) 


111,149,056 


75,497,472 


75,497,472 


75,497,472 


75,497,472 


75,497,472 


75,497,472 


75,497,472 


75,497,472 


75,497,472 


Table Write (byte) 


113,246,208 


77,594,624 


77,594,624 


77,594,624 


77,594,624 


77,594,624 


77,594,624 


77,594,624 


77,594,624 


77,594,624 




S Read (byte) 


8,192 































S Write (byte) 


4,096 


40,960 


69,632 


98,304 


122,880 


163,840 


176,128 


237,568 


241,664 


278,528 




Ma/x Signature Length 


21 


23 


23 


23 


23 


23 


23 


23 


23 


23 




Partition Count 


8,460 


38,291 


71,161 


85,327 


85,660 


85,692 


85,704 


85,707 


85,709 


85,711 




Preparation Time (s) 


5.78 


4.88 


4.94 


4.77 


4.86 


5.87 


4.91 


4.90 


4.79 


4.80 




Constructing Time (s) 


12.29 


13.58 


13.73 


14.49 


14.00 


14.56 


15.58 


14.46 


14.51 


16.05 


LinkedMDB 


Table R.ead (byte) 


731,906,048 


597,688,320 


597,688,320 


597,688,320 


597,688,320 


597,688,320 


597,688,320 


597,688,320 


597,688,320 


597,688,320 


Table Write (byte) 


884,998,144 


752,877,568 


752,877,568 


752,877,568 


752,877,568 


752,877,568 


752,877,568 


752,877,568 


752,877,568 


752,877,568 




S Read (byte) 


8,192 































S Write (byte) 


1,982,464 


7,389,184 


16,576,512 


24,403,968 


33,886,208 


45,350,912 


57,716,736 


68,988,928 


76,664,832 


80,326,656 




Max Signature Length 


63 


179 


203 


229 


243 


243 


243 


243 


243 


243 




Partition Count 


246 


9,073 


11,130 


11.189 


11,189 


11.189 


11,189 


11,189 


11,189 


11.189 




Preparation Time (s) 


59.64 


43.41 


44.10 


44.50 


45.34 


46.33 


46.58 


48.03 


46.88 


46.65 




Constructing Time (s) 


98.53 


112.46 


114.80 


117.44 


118.09 


116.79 


117.96 


117.52 


119.69 


118.03 


DBLP 


Table Read (byte) 


8,044,675,072 


5,731,516,416 


5,733,613,568 


5,733,613,568 


5,733,613,568 


5,733,613,568 


5,733,613,568 


5,733,613,568 


5,733,613,568 


5,733,613,568 


Table Write (byte) 
S Read (byte) 


9,353,297,920 
8^192 


7,092,568,064 



7,096,762,368 



7,096,762,368 



7,096,762,368 



7,096,762,368 



7,096,762,368 



7,096,762,368 



7,096,762,368 



7,096,762,368 





S Write (byte) 


53,248 


2,854,912 


3,956,736 


4,546,560 


5,300,224 


7,876,608 


10,571,776 


13,393,920 


15,962,112 


18,608,128 




Max Signature Length 


37 


99 


723 


745 


745 


745 


745 


745 


745 


745 




Partition Count 


2 


4 


14 


327 


928,765 


2,992,705 


3,596,837 


3,604,409 


3,605,063 


3,605,151 




Preparation Time (s) 


137.65 


108.19 


107.01 


106.82 


108.28 


117.37 


115.32 


115.57 


116.89 


116.30 




Constructing Time (s) 


17.92 


19.71 


20.31 


28.88 


62.53 


193.49 


436.94 


441.90 


614.60 


632.42 


Wikilinks 


Table Read (byte) 


15,065,939,968 


10,798,235,648 


10,831,790,080 


10,888,413,184 


11,156,848,640 


12,318,670,848 


12,580,814,848 


12,593,397,760 


12,595,494,912 


12,595,494,912 


Table Write (byte) 


17,205,035,008 


12,939,427,840 


13,006,536,704 


13,119,782,912 


13,656,653,824 


15.980,298,240 


16.504,586,240 


16,529,752,064 


16,533,946,368 


16,533,946,368 




S Read (byte) 


8,192 











24,797,184 


8,697,421,824 


12,882,042,880 


16,277,565,440 


17,842,032,640 


19,003,453,440 




S Write (byte) 


4,096 


4,096 


4,096 


36,864 


205,180,928 


8.431,570,944 


12,294,479,872 


15,117,230,080 


15,919,968,256 


16,244,105,216 




Max Signature Length 


3 


5 


9 


19 


129 


6,817 


8,349 


9,363 


9,421 


9,425 




Partition Count 


362,128 


2,357,366 


3,239,710 


3,273,445 


3,281,100 


3,299,007 


3,343,927 


3,401,435 


3,436,428 


3,450,357 




Preparation Time (s) 


146.29 


108.95 


113.45 


116.87 


114.84 


116.22 


116.30 


116.28 


114.72 


119.99 




Constructing Time (s) 


213.61 


366.58 


466.13 


585.40 


632.35 


664.99 


679.87 


763.97 


863.79 


1,117.25 


Dbpedia 


Table Read (byte) 
Table Write (byte) 


16,760,438,784 
19,295,895,552 


12,366,905,344 
15,453,913,088 


12,574,523,392 
15,869,149,184 


12,595,494,912 
15,911,092.224 


12,605,980,672 
15,932,063,744 


12,616,466,432 
15.953,035,264 


12,629,049,344 
15,978,201,088 


12,639,535,104 
15,999,172,608 


12,643,729,408 
16,007,561,216 


12,643,729,408 
16,007.561,216 




S Read (byte) 


8,192 


3,870,638^080 


5,215,023,104 


5,915,021.312 


6.404,620,288 


7.598,112,768 


8,796,708,864 


9,225,072,640 


10.492,932,096 


11,405.717,504 




S Write (byte) 


123,658,240 


4,553,515,008 


5,857,165,312 


6,507,692.032 


6,952,050,688 


8.115,949,568 


9.237,364,736 


9,629,892,608 


10.667,528,192 


11,225,329,664 




Max Signature Length 


1,501 


5,109 


7,687 


8,179 


8,213 


8,215 


8,269 


8,269 


8,269 


8,269 




Partition Count 


2 


4 


16 


1,463 


14,251,228 


35,729.811 


36.178,375 


36,192.245 


36.192,750 


36,192.805 




Preparation Time (s) 


4,980.11 


4,221.10 


4,226.36 


4,310.65 


4,290.77 


4,577.37 


4,554.91 


4,446.23 


4,410.29 


4,422.27 




Constructing Time (s) 


170.97 


215.58 


260.66 


362.50 


1,795.55 


3,881.94 


3,876.89 


3,984.64 


4,051.14 


5,012.17 


Twitter 


Table Read (byte) 


168,455,831,552 


120,275,861,504 


120,674,320,384 


121,194,414,080 


124,528,885,760 


141.601.800.192 


142.751.039,488 


142,753,136.640 


142,753,136,640 


142,753.136.640 


Table Write (byte) 


192,552,108,032 


144,531,521,536 


145,328,439,296 


146,368,626,688 


153,037,570,048 


187.183.398.912 


189.481.877,504 


189,486.071.808 


189,486,071,808 


189,486.071.808 




S Read (byte) 


8,192 











33,206,579,200 


130,105^450,496 


116,607^520,768 


137,478,197,248 


151,362,093,056 


162,154,037,248 




S Write (byte) 


4,096 


4,096 


4,096 


155,648 


34,853,953,536 


115,356,168,192 


110,612,774,912 


119,332,966,400 


121,634,852,864 


123,151,667,200 




Max Signature Length 


3 


5 


9 


33 


1,373 


4,354,479 


5,840,263 


5,848,053 


5,848,119 


5,848,119 




Partition Count 


728 


219,581 


459,986 


467,369 


467,369 


467,369 


467,369 


467,369 


467,369 


467,369 




Preparation Time (s) 


1,238.28 


859.67 


842.68 


850.36 


851.59 


831.17 


841.14 


877.76 


847.49 


854.87 




Constructing Time (s) 


1,392.42 


1,670.65 


1,824.11 


1,929.69 


2,066.06 


2,152.50 


2,265.44 


2,248.45 


2,226.88 


2,337.82 


SP2B 


Table Read (byte) 


105,736,306,688 


68,232,937,472 


68,232,937,472 


68,232,937,472 


68,232,937,472 


68,232,937,472 


68,232,937,472 


68,232,937,472 


68,232,937,472 


68,232,937,472 


Table Write (byte) 


120,431,050,752 


82,931,875,840 


82,931,875,840 


82,931,875,840 


82,931,875,840 


82,931,875,840 


82,931,875,840 


82,931,875,840 


82,931,875,840 


82,931,875,840 




S Read (byte) 


8^192 








2,285.568 


26!o83!328 


'221^638^656 


'39o!o49i792 


'425^611^264 


'443!654il44 


'4461136^320 




S Write (byte) 


118,784 


62,963,712 


97,890,304 


136,470.528 


196,829,184 


387,956,736 


495,534,080 


514,424,832 


523,583,488 


523,796,480 




Max Signature Length 


109 


109 


109 


109 


109 


109 


109 


109 


109 


109 




Partition Count 


50 


510 


511 


512 


512 


512 


512 


512 


512 


512 




Preparation Time (s) 


38.54 


28.52 


28.14 


27.88 


28.07 


27.92 


27.96 


27.79 


28.03 


27.86 




Constructing Time (s) 


59.91 


61.32 


59.62 


63.29 


63.51 


64.26 


64.11 


65.09 


64.40 


65.31 


BSBM 


Table Read (byte) 


5,179,965,440 


3,764,387,840 


3,764,387,840 


3,764,387.840 


3.764,387,840 


3.764,387,840 


3.764,387,840 


3,764,387,840 


3,764,387,840 


3,764,387,840 


Table Write (byte) 


6,228,541,440 


4,819,255,296 


4,819,255,296 


4,819,255,296 


4,819,255,296 


4,819,255,296 


4,819,255,296 


4,819,255,296 


4,819,255,296 


4,819,255,296 




S Read (byte) 


8,192 































S Write (byte) 


16,384 


106,496 


110,592 


167,936 


270,336 


442,368 


405,504 


585,728 


573,440 


499,712 




Max Signature Length 


35 


37 


37 


37 


37 


37 


37 


37 


37 


37 



Tabic 8: Sum-up of experiment results from Table [7] 





Jamendo 


LinkedMDB 


DBLP 


Wikilinks 


Dbpedia 


Twitter 


SP2B 


BSBM 


Partition count / Node Count 
Elapsed Time (s) 
Overall I/O (byte) 
Max Signature Length / Node Count 


0.064% 
28.72712 
1.6E+09 
0.00473% 


3.677% 
193.748 
1.42E+10 
0.01043% 


0.049% 
1622.765 
1.33E+11 
0.00324% 


63.127% 
3618.1302 
4.164E+11 
0.16503% 


8.935% 
7537.846 
4.34E+11 
0.02141% 


86.893% 
68052.12 
4.451E+12 
14.04035% 


0.166% 
29009.04 
1.59E+12 
0.00004% 


0.006% 
921.5325 
8.87E+10 
0.00042% 



